When server is unresponsive, slow, etc, one of the first reactions is to run 'uptime' or 'top'. Most users, and unfortunately, many sysadmins stop
there and make the conclusion that the CPU is too slow. Lots of time, in a webhosting environment with control panels, logging, and statistics
generation such as awstats, analog, and webanalyzer running, the fault lies in insufficient disk IO. However, sometimes we do see CPU being the
bottleneck. How do we know the CPU really is working like mad, and its not waiting for disk IO to complete? Simple, run 'top' or 'iostat' and check
the iowait. It should not remain high. How high is high? That is left as an exercise. There are several common scenarios in webhosting and
dedicated servers with cPanel, where CPU usage gets high, and the applications do not take the attention they need, and affects the user's
experience.
1. Runaway processes
Sometimes, user applications are buggy, and just keep running. Something as simple as while (1){}; if the web site gets just a few hits a second,
the server will crawl real soon. Pretty soon, the server goes into a death spiral. First, the CPU runs flat out, then the processes adds up, and
there are more and more httpd and php processes which takes up more and more RAM. RAM gets used up, processes starts to swap out to disk; and it
goes further down hill from there.
One really cute scenario I encountered along these lines; the user application calls itself. Something like "curl http://usersite/a.php" in
a.php itself.
Of course, its possible to limit the number of processes owned by the user, and to limit the total number of httpd processes, amount of memory
taken, CPU time etc. Out of scope of this entry though. Only trying to describe CPU bound processes in this entry, so I will just list out
keywords that will help if you need to look further: ulimit, /etc/security/limits.conf, httpd.conf
Just chmod 000 those user accounts that are running crazy, and kill -9 the processes. Also crontab -l -u userid to see if there's any cronjobs
running.
If you are in a new job, just took over a server and there are processes running which you can't find in /etc/init.d, check /etc/cron.* these are
all CentOS paths and filenames.. YMMV, if you are using other favours, that's why we try to stick to one favour of linux (and unix) as far as
possible. If you have freebsd, redhat, debian, solaris, tru64, aix etc in one shop.. good for your resume, not too good for your sanity :)
2. Deliberate hacking attempt
Many web applications are unfortunately buggy and easily exploited with exploit scripts distributed around. Shared web servers are easily hacked
in some way or other. The guys who hack the site then runs scripts which keep their irc processes alive, port scan, or send out UDP floods. Alot of
these scripts are very buggy, worst than the web applications, and they tend to take up 100% of CPU. This makes them easily detected when they
show up on 'top'. A few things need to be done; trace which application is vulnerable (modsecurity can be very helpful here, with auditing enabled),
disable perl for the nobody user, eg. using setfacl (unfortunately, this breaks some cPanel functions), setup iptables to disallow outgoing traffic
to unknown ports, and especially UDP if not required. Again, lots of topics here which is out of scope.
3. Normal processes such as mysql or apache just taking longer and longer to complete as your sites get busier.
my.cnf, the mysql config file, as it comes with cPanel is not good for busy sites. If you run mysqladmin status and see lots of slow queries, you
run mysqladmin processlist and see lots of queries that's taking more than 1 second or so to complete, its time to check into my.cnf and see if you
can tweak it a bit, usually give it lots of cache. Google for my.cnf optmizations and you should have lots of hits. Also check that your
databases are properly indexed. How about apache? a really good thing to do there is eaccelerator.
4. Application upgrade goes wrong
An application such as moodle can go from working smoothly to taking up all your CPU, from one version to another.. due to upgrades going wrong,
usually due to differences in the database tables. if you can, its really best to do a brand new installation, and do a migration, rather than
upgrade in place.
Lim Wee Cheong
01 Jan 2008
|